近期目睹了机器学习算法系统的快速发展,尤其是加强学习,自然语言处理,计算机和机器人视觉,图像处理,语音和情感处理和理解。凭借机器学习模型,算法及其应用的越来越重要和相关性,并且随着更多创新使用的深度学习和人工智能的情况,目前的体积呈现出一些创新研究工作及其在现实世界中的应用,如股票交易,医疗和医疗保健系统和软件自动化。本书中的章节说明了如何设计,优化和部署机器学习和深度学习算法和模型。该体积对于高级毕业生和博士生,研究人员,大学教师,练习数据科学家和数据工程师,专业人士和顾问以及在机器学习,深度学习和人工智能的广泛领域。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We present a framework for ranking images within their class based on the strength of spurious cues present. By measuring the gap in accuracy on the highest and lowest ranked images (we call this spurious gap), we assess spurious feature reliance for $89$ diverse ImageNet models, finding that even the best models underperform in images with weak spurious presence. However, the effect of spurious cues varies far more dramatically across classes, emphasizing the crucial, often overlooked, class-dependence of the spurious correlation problem. While most spurious features we observe are clarifying (i.e. improving test-time accuracy when present, as is typically expected), we surprisingly find many cases of confusing spurious features, where models perform better when they are absent. We then close the spurious gap by training new classification heads on lowly ranked (i.e. without common spurious cues) images, resulting in improved effective robustness to distribution shifts (ObjectNet, ImageNet-R, ImageNet-Sketch). We also propose a second metric to assess feature reliability, finding that spurious features are generally less reliable than non-spurious (core) ones, though again, spurious features can be more reliable for certain classes. To enable our analysis, we annotated $5,000$ feature-class dependencies over {\it all} of ImageNet as core or spurious using minimal human supervision. Finally, we show the feature discovery and spuriosity ranking framework can be extended to other datasets like CelebA and WaterBirds in a lightweight fashion with only linear layer training, leading to discovering a previously unknown racial bias in the Celeb-A hair classification.
translated by 谷歌翻译
Recommender systems are ubiquitous in most of our interactions in the current digital world. Whether shopping for clothes, scrolling YouTube for exciting videos, or searching for restaurants in a new city, the recommender systems at the back-end power these services. Most large-scale recommender systems are huge models trained on extensive datasets and are black-boxes to both their developers and end-users. Prior research has shown that providing recommendations along with their reason enhances trust, scrutability, and persuasiveness of the recommender systems. Recent literature in explainability has been inundated with works proposing several algorithms to this end. Most of these works provide item-style explanations, i.e., `We recommend item A because you bought item B.' We propose a novel approach, RecXplainer, to generate more fine-grained explanations based on the user's preference over the attributes of the recommended items. We perform experiments using real-world datasets and demonstrate the efficacy of RecXplainer in capturing users' preferences and using them to explain recommendations. We also propose ten new evaluation metrics and compare RecXplainer to six baseline methods.
translated by 谷歌翻译
Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.
translated by 谷歌翻译
多个现有基准测试涉及视频中的跟踪和分割对象,例如,视频对象细分(VOS)和多对象跟踪和分割(MOTS)(MOTS),但是由于使用不同的基准标准数据集和指标,它们之间几乎没有相互作用(例如J&F,J&F,J&F,J&F,地图,smotsa)。结果,已发表的作品通常针对特定的基准,并且不容易相互媲美。我们认为,可以解决多个任务的广义方法的发展需要在这些研究子社区中更大的凝聚力。在本文中,我们旨在通过提出爆发来促进这一点,该数据集包含数千个带有高质量对象掩码的视频,以及一个相关的基准标准,其中包含六个任务,涉及视频中的对象跟踪和细分。使用相同的数据和可比较的指标对所有任务进行评估,这使研究人员能够一致考虑它们,因此更有效地从不同任务的不同方法中汇集了知识。此外,我们为所有任务展示了几个基线,并证明可以将一个任务的方法应用于另一个任务,并具有可量化且可解释的性能差异。数据集注释和评估代码可在以下网址获得:https://github.com/ali2500/burst-benchmark。
translated by 谷歌翻译
视觉问题回答(VQA)是一项多模式的任务,涉及从输入图像中回答问题,以语义了解图像的内容并以自然语言回答。由于VQA系统回答的问题范围,使用VQA进行灾难管理是一项重要的研究。但是,主要的挑战是评估受影响地区的标签产生的延迟。为了解决这个问题,我们部署了预先训练的剪辑模型,该模型在视觉图像对中进行了训练。但是,我们从经验上看到该模型的零击性能差。因此,我们相反,我们使用此模型中的文本和图像的预训练嵌入,进行我们的监督培训,并超过Floodnet数据集上的先前最新结果。我们将其扩展到持续的设置,这是一种更现实的情况。我们解决了使用各种经验重播方法的灾难性遗忘的问题。我们的培训运行可在以下网址提供:https://wandb.ai/compyle/continual_vqa_final
translated by 谷歌翻译
本文描述了我们对第9届论证挖掘研讨会共同任务的贡献(2022)。我们的方法使用大型语言模型来进行论证质量预测的任务。我们使用GPT-3进行及时的工程,并研究培训范式多任务学习,对比度学习和中任务培训。我们发现混合预测设置优于单个模型。提示GPT-3最适合预测论点有效性,而论证新颖性最好通过使用所有三个训练范式训练的模型来估算。
translated by 谷歌翻译
混合整数程序(MIP)通常通过分支结合算法解决。最近,学会模仿专家强的分支启发式的快速近似,由于它成功地减少了解决MIP的运行时间,因此引起了人们的关注。但是,现有的学习与分支方法假设整个培训数据都可以在一次培训中获得。这个假设通常不正确,如果随着时间的推移以连续的方式提供培训数据,现有技术会遭受灾难性遗忘。在这项工作中,我们研究了迄今未开发的终身学习范式,以在混合整数程序上分支。为了减轻灾难性的遗忘,我们提出了Limip,该limip是由以两部分图的形式对MIP实例进行建模的想法,我们使用双方图形注意力网络将其映射到嵌入式空间。这种丰富的嵌入空间避免了通过应用知识蒸馏和弹性重量巩固的灾难性遗忘,其中我们学习参数的关键是保持疗效,因此受到保护,免受明显的漂移。我们评估了一系列NP硬性问题的利润,并确定与现有基线相比,在面对终身学习时,Limip的速度高达50%。
translated by 谷歌翻译
深度学习网络已在各种应用中表现出高性能,例如图像分类,语音识别和自然语言处理。但是,存在使用对抗攻击所利用的主要漏洞。对抗性攻击通过稍微稍微更改输入图像,使其对肉眼几乎无法检测到图像,但导致网络的分类非常不同。本文探讨了使用两种类型的体系结构:MobileNetV3和Resnet50探讨图像分割DeepLabV3模型上预计的梯度下降(PGD)攻击和自适应面膜分割攻击(ASMA),发现PGD在更改分割方面非常一致它的目标虽然ASMA对多类目标的概括不那么有效。然而,这种攻击的存在使所有图像分类深度学习网络处于剥削的危险之中。
translated by 谷歌翻译